A Significance Test for the Lasso.

نویسندگان

  • Richard Lockhart
  • Jonathan Taylor
  • Ryan J Tibshirani
  • Robert Tibshirani
چکیده

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discussion : “ a Significance Test for the Lasso ”

1. A short description of the test procedure. We start by presenting the proposed test procedure in a slightly different form than in the paper. Let β̂(λ) := arg min 2‖y −Xβ‖2 + λ‖β‖1 be the Lasso estimator with tuning parameter equal to λ. The paper uses the Lasso path {β̂(λ) :λ > 0} to construct a test statistic for the significance of certain predictor variables. For a subset S ⊆ {1, . . . , p...

متن کامل

Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and m...

متن کامل

Discussion : “ a Significance Test for the Lasso ”

Professors Lockhart, Taylor, Tibshirani and Tibshirani are to be congratulated for their innovative and valuable contribution to the important and timely problem of testing the significance of covariates for the Lasso. Since the invention of the Lasso in Tibshirani (1996) for variable selection, there has been a huge growing literature devoted to its theory and implementation, its extensions to...

متن کامل

Discussion : “ a Significance Test for the Lasso ”

We wholeheartedly congratulate Lockhart, Taylor, Tibshrani and Tibshrani on the stimulating paper, which provides insights into statistical inference based on the lasso solution path. The authors proposed novel covariance statistics for testing the significance of predictor variables as they enter the active set, which formalizes the data-adaptive test based on the lasso path. The observation t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Annals of statistics

دوره 42 2  شماره 

صفحات  -

تاریخ انتشار 2014